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dimensionality results for four tests across four different ATI 
assignments were also different, indicating that the essential 
dimensionality estimate for a test is related to the characteristics 
of ATI items. It was found that reducing sample size or reducing the 
number of test items and ATI items does not assure unidimensionality. 
Relationships between the existence of the item invariance property 
and the essentially unidimens ional item calibrations are low across 
test forms and mathematics areas. A further study of the criteria of 
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Robustness of tridimensional IRT Catibration in the Presence of Essential Dimensionality 

Abstract 

Stout (1987. 1990) has argued that the essential dimensionality assumption is a valid substitute 
for Lord's ^dimensionality assumption (Lord. 1980). The first purpose of this study was to investigate 
the stability of two essential dimensionality measures across ten random samples within a particular ATI 
selection. The second was to investigate the discrepancy of the essential dimeivonality estimates for a test 
across different ATI selections and sample sizes. Finally, the third purpose was to investigate the validity 
of replacing the IRT uiudimensionality assumption with the essential unidimensionality assumption using 
the existence of the invariance property of item parameters as a criterion. The results of this study indicated 
that the stability of two essential dimensionality measures was low for some tests across ten random 
samples. The correlation between two different essential dimensionality measures was high within the 
same sample. The essential dimensionality results for four tests across four different ATI assignments was 
also different. This finding indicates that the essential dimensionality estimate for a test is related to the 
characteristics of the ATI items. In the second section of analysis, the effect of reducing the number of 
examinees and test items was analyzed. It was found that reducing sample size does not provide consistent 
improvement on the degree of essential unidimensionality. Also, reducing the number of test items and 
ATI items did not assure unidimensionality. The characteristics of the ATI items likely has more 
influence on essential dimensionality estimates. The validity of replacing the IRT unidimensionality 
assumption by the essential unidimensionalily assumption was assessed in the last section of the analysis 
using the invariance of item parameters as evidence. It was universally found that the relationships between 
the existence of the item invariance property and the essentially unidimensional item calibrations (i.e.. 
arithmetic and algebra scale) are low across test forms and mathematic i xas. Therefore, a further study on 
the criteria of ATI items is needed to enhance the validity of replacing the IRT unidimensionality 
assumption by the essential unidimensionality assumption. 
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Item response theory (Lord, 1980) has been widely used in test equating and test bias studies 
because of its unique invariance property for item parameters and ability estimates. The in variance property 
of KT exists only when the assumptions of unidimensionality and local independence are true. The 
classical unidimensionality assuro^on however has been criticized as not realistic in a real test (Traub, 
1983). Stout (1987, 1990) has provided a weaker and psychologically more meaningful assumption called 
essential dimensionality, and has argued that the essential dimensionality assumption is a valid substitute 
for Lord's um^iensionality assumption. The strengths of Stout's essential dimensionality concept are his 
objective essential dimensionality statistics and the corresponding easy-to-use computer program 
(DIMTEST, 1992). Stout's contributions, however, encounter four challenges . First of all, the degree of 
essential dimensionality of a group of target items (or a test) depends heavily on the characteristics of the 
assessment items (i.e., ATI) that are chosen. For instance, Wang and Hocevar (1994) have found that a 
group of items in the arithmetic area may be flagged as essentially unidimensional when they are compared 
to decimal fraction ATI items, but may be mulUdimensional when ratio proportion ATI items are chosen. 
Secondly, both of Stout's essential dimensionality statistics are subject dependent because the original 
dichotomous item responses are used in the DIMTEST program. Therefore, two groups of examinees with 
distinct cognitive skills may interact differently with a particular group of ATI items and result in two 
magnitudes of essential dimensionality. Thirdly, the number of items and examinees may also influence 
the result of an essential dimensionality analysis since it may be easier to have a unidimensional test with 
fewer items as well as fewer subjects. Finally, there is no clear cutoff score that the user can use to 
conclude that essential dimensionality is totally equivalent to Lord's unidimensionality assumption. 
Beyond all these problems, the stability of the two essential dimensionality statistics across random 
samples is also unknown. 

There were three purposes to this study. The first purpose was to investigate the stability of the 
two essential dimensionality measures across ten random samples within a particular ATI selection. The 
second was to investigate the discrepancy of the essential dimensionality estimates for a test across different 
ATI selections and sample sizes. Finally, the third purpose was to investigate the validity of replacing the 
IRT unidimensionality assumption with the essential dimensionality assumption using the existence of the 
invariance property of item ? -ameters as a criterion. 

Methodology 

Data 

The data for this study were adapted from the Second International Mathematics Study (SIMS) of 
the International Association for the Evaluation of Educational Achievement (1985). During 1980 and 
1982. the Second ffiA International Mathematics Study researchers collected data on mathematics curricula, 
teaching practices, and achievement from samples of students, teachers, and schools in 20 countries. SIMS 
was conducted at two levels: (1) Population A in which students were (typically) in l.te national grade in 
which the modal age was 13; and (2) Population B where students were taking the imu >■ /anced pre- 
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university mathematics courses) offered in their school systems. However, only the U.S. population A 
study of SIMS was considered in the present study. Population A of the U.S. study is defined as the eighth 
graders in both main-stream and non-public schools. Mentally, physically, emotionally, or learning 
disabled students who were placed into special education class were not included. SubM<s were then 
selected by using school type, regional standard metropolitan statistical area (SMSA), location, and 
metropolitan status as stratification variables. 

Instruments 

There were two major SIMS study designs: longitudinal and cross-sectional. The U.S. was in the 
longitudinal study design. For the U.S. instrument, an eighth-grade mathematics achievement test that 
consisted of a total of 1 80 items which were selected from the international bank of 196 items divided into a 
40-item core subtest and four 35-item "rotated forms- was used. The 40 core items were administered to all 
examinees, and rotated forms were randomly assigned to students (approximately one- fourth of the students 
taking each form). In other words, each student was administered the core and one rotated form for a total of 
75 of the 180 items in the pool. Every SIMS achievement test covers five major content areas: arithmetic, 
algebra, geometry, statistics and measurement There were several subcontent categories under each major 
content area. For example, there w*e eight subcontent areas within the arithmetic area: Natural Numbers 
(001). Common Fractions (002), Decimal Fractions (003). Ratios. Proportions, and Percent (004). Number 
Theory (005). Power and Exponents (006). Square Roots (008). and Dimensional Analysis (009). Four test 
forms of eighth grade mathematic tests (form A, B. C. and D) were investigated in the U.S. study and were 
recoded as test 1 to test 4. Also every subtest was relabeled for this study (see Table 1). The first number 
of the subtest denotes the resourceof the subtest, and the extension number 1, 2. 3. and4 respectively 
denotes arithmetic, algebra, geometry, and measurement. 

In the stability analysis using ten random sample data subsets, only the arithmetic and algebra 
tests which had an appropriate number of items for ATI assignment (i.e.. common fractions (002). decimal 
fractions (003). and ratios (004) for arithmetic, and integers (101). formulas (104). and equations (106) for 
algebra) were chosen for analysis, Readers who are interested in the complete content of all SIMS items 
should refer to "Technical Report I" of SIMS (Chang & Ruzicka. 1985). 

To examine the effect of sample size on DIMTEST estimates, four different numbers of examinees 
were chosen (n=1600. 800. 400. 200). First samples were selected from the original data with an arbitrary 
fixed number of 1600 examinees. This 1600-case dataset was then randomly split into three mutually 
exclusive subsamples (i.e.. 200, 400, 800). 



.ERJC 



5 



4 



Table 1. Labels of Tests and Subtests in SIMS U.S. Study. 



Test Items N Description 
Recode 



1 


75 


1652 


Test A 


1.1 


28 


1652 


Subtest of test A (Arithmetic) 


1.2 


14 


1652 


Subtest of test A (Algebra) 


1.3 


17 


1652 


Subtest of test A (Geometry) 


1.4 


12 


1652 


Subtest of test A (Measurement) 


2 


75 


1610 


TestB 


2.1 


28 


1610 


Subtest of test B (Arithmetic) 


2.2 


14 


1610 


Subtest of test B (Algebra) 


2.3 


17 


1610 


Subtest of test B (Geometry) 


2.4 


12 


1610 


Subtest of test B (Measurement) 


3 


75 


1668 


Test C 


3.1 


27 


1668 


Subtest of test C (Arithmetic) 


3.2 


14 


1668 


Subtest of test C (Algebra) 


3.3 


16 


1668 


Subtest of test C (Geometry) 


3.4 


13 


1668 


Subtest of test C (Measurement) 


4 


75 


1619 


TestD 


4.1 


27 


1619 


Subtest of test D (Arithmetic) 


4.2 


14 


1619 


Subtest of test D (Algebra) 


4.3 


16 


1619 


Subtest of test D (Geometry) 


4.4 


13 


1619 


Subtest of test D (Measurement) 



To investigate the effects of reducing the number of ATI item on DIMTEST estimates, two types 
of subsamples were generated. In substudy one, the first 28 items were chosen from each 75-item test 1, 2, 
3, and 4 and the essential dimensionality estimates for these subtests were calculated using 3 randomly 
selected ATI items. The purpose of this study was to determine if the previously reported existence of 
unklimensionality in more content specific tests (see Wang & Hocevar, 1994) is related to a reduction of the 
number of ATI items. In the second substudy the effect of reducing the number of ATI items was 
investigated more systematically using three different numbers of randomly selected ATI items ~ 12, 8, 
and 4. 

To examinee the invariance property of item parameters in an essentially unidimensional test, 
arithmetic and algebra items in tests 1, 2, 3, and 4 were examineed using two randomly selected, mutually 
exclusive, split-half samples with 1/2 of the original sample size of examinees. 
Analysis . 

There are three sections in the analysis. In the first section, a replication study design was applied 
to test the reliability of Stout's statistics across random samples. That is, ten sample data sets with 
approximately one-fourth of the original sample sfcee of examinees (N= 400) were randomly selected from 
the original data for each test Furthermore, there were three ATI selections, (common fractions, decimal 
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fractions and ratios) one for each of the four arithmetic tests. Along the same lines, there were three algebra 
ATI selections (integers, formulas, and equations) for each of the four algebra tests. The essential 
dimensionality of both arithmetic and algebra for the four different test forms (i.e.. test 1. 2, 3, and 4) were 
emulated ten tunes using each random sample described earlier. The means. standanJ deviations and ranges 
of me ten replicated Stout's essential dimensionality statistics for arithmetic and algebra tests were 
calculated and used as indices to assess the consistency of DIMTEST across random samples. 

To examine the effect c' sample size on the essential dimensionality estimates, four mutually 
exclusive random subsamples from four SIMS original tests with 200, 400, 800, and 1600 cases were 
investigated in the second section of the analysis. The measurement items in each test were chosen as ATI 
items and fixed across the four subsamples. The effect of reducing the number of ATI items on DIMTEST 
also was investigated in this section of analysis by matching the number of total items and ATI items 
between the general mathematic tests (test 1, 2, 3. 4) and content specific tests (test 1.1, 2.1, 3.1 and 4.1). 

The idea of the essential dimensionality statistic is to assess whether there is a dominant trait 
measured by the test Stout (1990) suggests that multidimensional item characteristics and abilities are 
suitable for unidimensional KT as long as there is a dominant trait Based on this argument, the item 
invariance property should fit when unidimensionality is replaced with essential dimensionality. The 
purpose of the last section of this study was to determine the equivalence between the KT 
unidimensionality assumption and essential dimensionality by investigating the relationship between 
essential ^dimensionality and the item invariance property. The item invariance property was then 
investigated for essentially multidimensional test 1 and the essentially unidimensional arithmetic and 
algebra test within test 1 (or test 1.1 and 1.2) using Lord's chi-square (1980), and Raju's two exact area 
measures (1988) to assess invariance. 

Three corresponding research questions are: first, are two measures of essential dimensionality 
stable across samples ?; second, do smaller sample tests and tests with fewer ATI items have higher degree 
of the essential unidimensionality ? ; and third, is the degree of the fit of the item invariance property for a 
test associated with the degree of the essential unidimensionality of the test ? 

Results 

Stability nfniMTF.rr 

Table 2 displays the summary results of means and standard deviations of the two essential 
dimensionality statistics for four arithmetic teste. 1.1 to 4 .1. with three arithmetic subarea ATI items 
using ten randomly selected samples. Surprisingly, both of Stout's essential dimensionality T estimates 
vary across random samples. To illustrate, the range of Stout's T and T' measures is found to be as high as 
3.00 and 3.66. respectively. And further, some tests were identified as essentially multidimensional almost 
as many times as they were identified as essentially unidimensional within the same ATI situation For 
example, test 1.1 was identified as essentially unidimensional only six times out of a total of ten nails 



using common fraction (002) ATI items. This result indicates that the stability of essential dimensionality 
statistics across random sample is fairly low. 

As in a previous study (Wang & Hocevar, 1994), the degree of the essential dimensionality for a 
test was found to be highly associated with the characteristics of ATI items. For example, Test 1. 1 which 
was identified as either 60%, 90% and 80% essentially unidimensional depending on whether common 
fractions, decimal fractions, and ratio ATI items were used (Refer to Table 2 ). In other words, the degree 
of the essential dimensionality of a test depends on the set of ATI items that was assigned. 

In addition, it is noteworthy that the difference between the original and refined T statistic is 
negligable, except in the case of lest 2.1 with common fractions ATI items. However, Stout's original T 
score tends to be more reliable due to its lower range and SD across ten estimates than its refined 
counterpart (J'). The refined essential dimensionality statistic, on the other hand, tends to be more 
powerful. A detailed comparison study of these two statistics is needed for a more conclusive interpretation 
for their results. 

Furthermore, Table 2 demonstrates that the degree of the essentU dimensionality for the four test 
forms is not the same. That is tests 3.1 and 4.1 generally were found to be more essentially 
unidimensional than tests 1.1 and 2.1. This discrepancy may be attributed to the effects of test forms. In 
other words, the discrepancy of the essential dimensionality within four tests may due to different item 
compositions in the four tests. However, it is important to point out that the four forms were created 
randomly. 

Table 3 presents the results of an identical analysis on four algebra tests using three different AT 1 
assignments. It is shown that even though the magnitude of the largest range of two Stout's statistics 
(2.62 and 3.40 in the algebra tests) is about the same as the largest magnitude in the arithmetic subtests 
displayed earlier, the acceptance rates of the essential dimensionality assumption for the algebra tests are 
higher and more consistent than the arithmetic tests. This result implicates that the degree of the essential 
unidimensionality for four algebra tests is higher than their four arithmetic counterparts. However, test 4.2 
is the only test that was identified as essentially unidimensional across all three ATI selections. The degree 
of consistency of essential unidimensionality estimates across the three ATI forms is also higher than in 
the arithmetic counterpart. These findings indicate either SIMS algebra subtests are more essentially 
unidimensional than the arithmetic counterparts or the U.S. students' cognitive abilities in algebra are more 
homogeneous than their cognitive abilities in arithmetic. 

Again, comparing Stout's essential original dimensionality measure with its refined counterpart, 
both Table 2 and Table 3 show that Stout's original statistic is more consistent than the refined statistic as 
indicated by a smaller standaid deviation and range. However, as mentioned imediately above, highly 
consistent unidimensional "flags" were noted on virtually all algebra tests (Table 3) by both T and the 
refined T'l 
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Resource 
Test 


ATI 


Mean 

T 


SDt 


Mean 
T 


SDt* 


RT* 


RT' 




Pr 


1.1 




1.11 


0.99 


1.35 


1.32 


2.93 


3.53 


0.60 


0.60 


2.1 


Com 


0.78 


0.% 


1.00 


1.22 


3.00 


3.66 


0.90 


0.60 


3.1 


Frac. 


0.51 


0.50 


0.64 


0.59 


151 


1.83 


1.00 


0.90 


4.1 




0.38 


0.56 


0.43 


0.65 


1.59 


1.87 


1.00 


1.00 


1.1 




0.73 


0.54 


0.87 


0.67 


1.92 


2.37 


0.90 


0.90 


2.1 


Deci. 


•0.31 


0.64 


-0.33 


0.76 


1.62 


1.92 


1.00 


1.00 


3.1 


Frac. 


0.06 


0.45 


0.08 


0.54 


1.50 


1.81 


1.00 


1.00 


4.1 




0.57 


0.64 


0.71 


0.81 


1.77 


2.34 


1.00 


0.90 


1.1 




0.74 


0.85 


0.79 


1.01 


2.61 


3.25 


0.90 


0.80 


2.1 


Ratio 


1.16 


0.47 


1.32 


0.56 


2.77 


3.27 


0.90 


0.70 


3.1 


Prop. 


1.12 


0.73 


1.26 


0.88 


2.65 


2.99 


0.80 


0.70 


4.1 




1.41 


0.60 


1.59 


0.70 


2.49 


2.92 


0.70 


0.60 



* R denotes the range between maximum and minimum estimates. 
** P denotes percent of times the test was flagged as essentially tridimensional. 



Table 3. Means, Standard Deviations, and Acceptance Rates for Essential Dimensionality 



Resource 

Test 


ATI 


Mean 
T 


SDj 


Mean 
T 


SDt 


Rt 


Rt 


IT 


PT 


1.2 




0.31 


0.32 


0.34 


0.41 


0.97 


1.28 


1.00 


1.00 


2.2 


Integ. 


0.91 


0.60 


1.07 


0.75 


2.42 


2.85 


0.90 


0.90 


3.2 




0.49 


0.31 


0.63 


0.40 


1.00 


1.29 


1.00 


1.00 


4.2 




0.46 


0,43 


0.49 


0.50 


1.57 


1.77 


1.00 


1.00 


1.2 




0.26 


0.77 


0.35 


0.97 


2.62 


3.40 


1.00 


0.90 


2.2 


Form 


-0.40 


0.62 


-0.48 


0.73 


1.79 


2.07 


1.00 


1.00 


3.2 




-0.47 


0.86 


-0,56 


1.00 


2.81 


3.27 


1.00 


1.00 


4.2 




-0.26 


0.56 


-0.37 


0.75 


1.67 


2.14 


1.00 


1.00 


1.2 




-0.45 


0.60 


-0.44 


0.71 


1.68 


1.99 


1.00 


1.00 


2.2 


Equa. 


-0.23 


0.52 


-0.25 


0.61 


1.61 


1.85 


1.00 


1.00 


3.2 




0.01 


0.91 


0.05 


1.10 


2.91 


3.52 


0.90 


0.90 


4.2 




0.62 


0.40 


0.77 


0.49 


1.13 


1.45 


1.00 


1.00 



Effect of Samnle She anH ATI ggfi 

The effect of item size on essential dimensionali easures were examined in the following 
analyses. To examine the effect of sample size on DIMTEST estimates, four different numbers of U.S. 
examinees were selected for each test. First a sample was selected from original data for general 
mathematics achievement test 1 with an arbitrary fixed number of examinees, which was 1600. The 1600- 
case dataset was then randomly split into three mutually exclusive subsamples (i.e., 200, 400, 800). That 
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is. a datasct was exclusively selected for test 1 with sample size 1600, 800, 400, and 200, as subsamples 
test 1A, IB, 1C, and ID, respectively. Test 2A to 2D are subsamples of test 2, and so on. 

Table 4.1 presents the essential dimensionality assessment for the four SIMS tests using the four 
different sample sizes. Test 1 was flagged as essentially unidimensional in the test 1 A and IB situations, 
and DIMTEST encountered estimation problems when sample size was reduced to 400 for test 1. The 
degree of the essential multidimensionality slightly increased when the sample size decreased from 1600 to 
800 in test 1. Test 2 was flagged as multidimensional when the sample size was 1600 and 800, but the test 
was identified as essentially unidimensional when the sample size was reduced to 400. The dimensionality 
result for test 3, however, shows that the degree of the essential unidimensionality increased (from p=.07 to 
p=.15) when the sample size was reduced from 1600 to 800, but the p-value decreased to .06 when the 
sample size was reduced to 400. The effect of the sample size on the essential dimensionality for test 4 has 
the opposite pattern; that is, the essential dimensionality decreased first when the sample size reduced to 800 
and jumped back to the similar level of essential dimensionality for size 1600 when the sample size reduced 
to 400. This result indicates that smaller sample size may increase as well as decrease the degree of the 
essentia] unidimensionality estimate. 

The previous results suggests two conslusions: first, changes in sample size affect DIMTEST 
estimates in an unpredictable, but fairly minor way. That is, changes in the estimates may be totally 
attributable to normal sample error. Second, it is noteworthy that small sample size cause some 
convergence problems on some occasions for DIMTEST. Thus, it appears that N=400 might be considered 
a minimum sample size for DIMTEST. 

To investigate whether the previously reported greater degree of essential umdimensionality in the 
amuysis of arithmetic and algebra tests (see Wang & Hocevar, 1994) is due to a smaller Dumber of items in 
these analyses, a new multidifflen.siona] general achievement test was generated by arbitrarily assigning the 
fust twenty-eight items from tests 1 to 4. The ess intial dimensionality estimates were calculated four 
times for each new test using four different sets of three randomly selected ATI items. The goal of this 
analysis was to determine the effects 'reducing the number of total items on DIMTEST estimates. Table 
4.2 presents the essential dimensionality results for the 28-item subtests. Only one out of a total of sixteen 
trials shows the predicted significant result. According to this result, a general achievement test with 
multidimensional items may be flagged as unidimensional because the number of the ATI items is small. 
Moreover, the critical p-values for every trial were not highly consistent For instance, the four p-values 
for test 2 fall into the range from .90 to .15. Test 3 was identified as multidimensional (with p-value 
equals .02) when the first three items in the test were ATI items, but was flagged as unklirnensional (p- 
value equals .79) when other items were ATI. This discrepancy, again, shows that the actual ATI 
characteristics has a significant effect on the essential dimensionality estimates. 
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Table 4.1 Essential Dimensionality for Four SIMS General Mathematics Achievement Tests and 
Subsynples^^ Items. 



Test Sample size 


T P 


r 


P' 


Subsample of test 1. 








1A 1600 


-.11 .54 


-.12 


.55 


IB 800 


-.20 .58 


-.23 


.59 


1C 400 


Error due to small sample size. 






ID 200 


Error due to small sample size. 






Subsample of test 2 






2A 1600 


2.63 .00** 


3.08 


.00** 


2B 800 


2.08 .02* 


2.50 


.00** 


2C 400 


1.10 .13 


1.22 


.11 


2D 200 


Error due to small sample size. 






auosampie oi test 3 






3A 1600 


1.45 .07 


1.77 


.04* 


3B 800 


1.04 .15 


1.33 


.09 


3C 400 


1.52 .06 


2.09 


.02* 


3D 200 


Error due to small sample size. 






Subsample of test 4. 






4A 1600 


-.47 .68 


-.62 


.73 


4B 800 


.15 .44 


.12 


.45 


4C 400 


-.45 .68 


-.52 


.70 


4D 200 


Error due to small sample size. 







Table 4.2 Essential Dimensionality for Four 28-item General Tests with 3 Randomly Selected ATI. 



Target Test 


ATI 


Stout's T 


P-value 


Refined T 


P-value 


#ATl/#Total 
















.37 


.35 


.44 


.33 


Test 1 (3/28) 


Random 


-.36 


.64 


-.56 


.71 






.60 


.27 


.85 


.20 






-.62 


.73 


-.72 


.77 






.24 


.40 


.31 


.38 


Test 2 (3/28) 


Random 


.38 


.35 


.39 


.35 






.78 


.22 


1.00 


.15 


U.S. 




-.99 


.83 


-1.27 


.90 






1.51 


.07 


2.02 


.02* 


Test 3 (3/28) 


Random 


.36 


.36 


.49 


.31 






-.52 


.70 


-.66 


.75 






-.66 


.75 


-.82 


.79 






.83 


.20 


1.14 


.13 


Test 4 (3/28) 


Random 


1.28 


.10 


1.64 


.05 






-.03 




.00 


.50 






-.93 


.83 


-1.21 


.89 
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Tabic 43 shows the effects of reducing ATI items from 12 to 8 and then to 4 on the essential 
dimensionality estimates for the original general tests 1 to 4. According to the results, three of four tests 
were multidimensional when the number randomly selected of ATI items equaled 12. When the number of 
ATI was 8 or 4, none of the tests are multidimensional except test 1 and 2 with 4 ATI items. Thus, these 
results suggest that reducing the number of ATI items may increase the possibility of inacurately 
concluding that a test is unidimensional. Taken together, the analyses in Table 4.2 and 43 do suggest that 
using either a smaller number of total items (Table 4.2) or a smaller number of ATI items (Table 4.3) does 
increase the chance that a multidimensional test (i.e., mathematics genera achievement) win be identified as 
unidimensionaL 

Unfortunately, the analyses reported in Table 4.2 and 4.3 also are confounded because the selection 
of ATI items was random. Prior analyses (e.g., Table 2 and Table 3) clearly suggest that a conclusion 
supporting unidimensionality depends on the nature of the arbitrarily selected ATI items. It is intuitively 
reasonable that selecting ATI items at random will result in stronger support for unidimensionality because 
the ATI standard is itself more heterogeneous. To test this hypothesis, an additional analysis, analogous to 
that reported in Table 4.2, is shown in Table 4.4. In this analysis, three ATI items within four 
mathematics subcorneal ts ~ decimal fractions, ratios, equations and estimations were selected to be 
homogeneous ATI items (as recommended by Stout). 

Table 4.4 shows that only four out of a total of sixteen trials of dimensionality assessment with 
homogeneous ATI items demonstrated significant results. In other words, increasing the homogeneity of 
ATI items does not produce uniformly significant essential dimensionality estimates. However, the four 
28-item general tests were all flagged as multidimensional when decimal fractions items were the ATI 
items. This finding, somewhat, suggested that the actual characteristics of ATI items has a stronger 
influence on the essential dimensionality estimates than the number of ATI items. 
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Table 4.3. Essential Dimensionality for Four SIMS General Tests with Three Sizes of ATI 



Target Test ATI Stout'? T P-value Refined T P-value 

#Total 





12 


-.03 


.51 


.00 


.50 


Test 1 (75) 


8 


-.93 


.83 


-1.21 


.89 




4 


1.83 


.03* 


2.01 


.03* 




12 




on** 




.uu 


Test 2 (75) 


8 


.n 


.45 


.05 


.48 




4 


2.36 


.01* 


2.96 


.01* 




12 


2.05 


.02* 


2.34 


.01* 


Test 3 (75) 


8 


.29 


.38 


.25 


.40 




4 


.45 


.33 


.64 


.26 




12 


2.83 


.00** 


3.37 


.00** 


Test 4 (75) 


8 


.08 


.46 


.16 


.42 




4 


.74 


.23 


.94 


.17 



Tabic 4.4 Essential Dimensionality Statistics for Four 28-item General Tests with 3 Homogeneous ATJ 
Items. 



Target Test 
#ATiy»TotaI 


ATI 


Stout's T 


P-value 


Refined T 


P-value 




Deci. Frac. 


2.14 


.02* 


2.72 


.00** 


Test 1 (3/28) 


Ratios. 


.53 


.30 


.72 


.23 




Equations 


-.51 


.70 


-.72 


.76 




Estimations 


.01 


.50 


.05 


.48 




Deci. Frac. 


1.79 


.04* 


2.25 


.01* 


Test 2 (3/28) 


Ratios 


.53 


.30 


.64 


.26 


U.S. 


Equations 


.83 


.20 


1.09 


.14 


Estimations 


.82 


.21 


1.10 


.14 




Deci. Frac. 


1.63 


.05* 


2.09 


.02* 


Test 3 (3/28) 


Ratios 


-.68 


.75 


-.82 


.79 




Equations 


.69 


.25 


.88 


.19 




Estimations 


.14 


.44 


.23 


.41 




Deci. Frac. 


1.28 


.10 


1.64 


.05* 


Test 4 (3/28) 


Ratios 


.92 


.18 


1.18 


.12 




Equations 


.09 


.46 


.11 


.46 




Estimations 


.72 


.24 


1.00 


.16 
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Robustness of DIMTEST 

To investigate the robustness of the essential dimensionality statistics, the relationship between 
the degree of the essential dimensionality of a test and the existence of item invariance property in the test 
was examined Two levels of tests were used in this analysis. Test 1.1 and 1.2, arithmetic and algebra, 
were treated as essentially unidimensional tests based on results of Wang and Hocevar (1994), while the 
general mathematics test 1 was treated as multidimensional. Item parameters were estimated using a two- 
parameter logistic model (2-PL) with Bayes estimate procedure, which assumes a prior normal distribution 
of ability. 

Studies indicated that violating the unidimensionality assumption produces a substantial lack of 
item parameter invariance (Ackerman, 1991; Oshima & Miller, 1990). Because Stout (1990) suggested that 
the essential dimensionality assumption is a valid substitute for IRT unidimensionality assumption, the 
item invariance property should fit essentially unidimensional tests better than multidimensional tests. In 
other words, the item invariance property is assumed to fit tests 1.1 and 1.2 significantly better than general 
test 1. 

Table 5.1 presents the results of the IRT item invariance examination for the essentially 
unidimensional arithmetic test An item was detected as lacking the item invariance property when either 
one of she three invariance measures was significant It was unexpectedly found that 7 out of a total of 28 
arithmetic items presented a lack of item invariance even when the test was essentially dimensional. This 
result provokes the need to reconsider the equivalence between the essential dimensionality assumption and 
the IRT unidimensionality assumption and the appropriateness of using the essential dimensionality 
assumption as a substitute for the IRT unidimensionality assumption. 

Statistically, the sensitivity of Raju's exact unsigned measure is much stronger than the other two 
measures, and this may introduce some spurious detection. Only three items, furthermore, were identified 
by all three statistics as violating the item invariance property. The correlation between Lord's chi-square 
and Raju's signed area measure is higher than the other two possible pairings. 

Table 5.2 presents a similar analysis on the essentially unidimensional algebia test 1.2. The 
invariance property does not hold for 4 items out of a total of 14 algebra items which were calibrated on an 
essentially unidimensional "algebra" scale. In three of the four cases, all three statistics uniformly indicated 
a lack of invariance. These results indicate that Stout's essential dimensionality is not a sufficient 
condition for the existence of the invariance property of the item parameters. However, this criticism is 
somewhat qualified in that only six of forty-two items were uniformly identified as lacking invariance by all 
three invariance indices. 
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Table 5.1 . Item Parameter Estima js a and b and Three Item Invariauce Indices for Esentially 
Unidimensional AriUimetic Test 



Item SI S2 SI ij t 1 P ESA p EUA 

ys076 1.41 -0.27 1.15 -0.36 

ys075 0.86 0.90 ;.00 1.13 

ys003 0.92 0.27 0.93 0.64 

ys043 1.22 -033 1.11 -0.35 

ys045 0.80 1.18 0.65 1.52 

ysl09 1.86 -1.05 1.62 -0.95 

ys005 1.37 -028 1.45 -0.17 

ysl40 1.07 -0.63 0.70 -0.63 

ysl89 1.34 -0.48 1.23 -0.37 

ys079 2.18 -0.48 2.28 -0.40 

ysl81 1.51 0.17 1.53 0.26 

ysl90 1.01 -0.26 1.01 0.09 

ys008 1.28 0.60 1.31 0.50 

ysl79 1.20 -0.08 1.11 -0.04 

ysO09 1.28 0.49 1.30 0.64 

ys046 1.27 0.02 1.35 -0.03 

ys042 1.16 -1.36 1.03 -132 

ys074 1.42 -0.77 1.09 -0.80 

ysl06 1.64 -0.68 1.69 -039 

ysl85 0.70 1.91 0.71 2.06 

ysl87 0.96 0.61 1.04 0.72 

ys077 0.68 -0.72 0.54 -036 

ysl08 1.34 -0.65 1.30 -0.38 

ysl42 1.10 1.85 1.02 1.91 

ysl91 1.00 -0.77 0.85 -0.86 

ysl92 1.20 -0.02 1.12 0.13 

ys048 1.24 0.24 1.42 0.21 

ysOll 1.02 -0.30 1.14 -0.12 _ 

Note. SI and S2 denote replications 1 and 2 which consist of approximately half of the original sample 

* p<.05. 



2.90 


0.23 


-0.92 


0.35 


•1.66 


0.09 


8.26 


0.01* 


1.36 


0.17 


2.51 


0.01* 


9.40 


0.00* 


2.80 


0.00* 


2.80 


000* 


4.43 


0.10 


1.72 


0.08 


-2.03 


0 04* 


2.02 


0.36 


1.35 


0.17 


-136 


0 17 


4.80 


0.09 


1.03 


0.29 


-2.13 


0 03* 

ViVJ 


1.69 


0.42 


1.24 


0.21 


1.28 


0.19 


9.60 


0.00* 


0.02 


0.97 


-235 


0 01* 


2.28 


0.31 


1.14 


0.25 


-1 40 


0 15 


1.55 


0.45 


1.23 


0.21 




0 91 


1.22 


0.54 


1.03 


0.30 


1 03 


0 ?Q 


10.00 


0.00* 


3.15 


0.00* 


-3 15 


0 00* 

\JmKnJ 


0.74 


0.68 


•0.83 


0.40 


082 


0 41 


0.48 


0.78 


0.44 


0.65 


-065 


0 51 


2.67 


0.26 


1.38 


0.16 


1 38 


0 16 

XJm 1U 


0.53 


0.76 


-0.63 


0.52 


066 


0 50 


1.77 


0.41 


0.19 


0.84 


-0M 


0.39 


4.22 


0.12 


-0.23 


0.81 


-1.85 


0.06 


1.01 


0.60 


1.00 


0.31 


0.99 


0.32 


1.10 


0.57 


0.45 


0.64 


0.47 


0.63 


1.96 


0.37 


0.81 


0.41 


1.22 


0.21 


4.28 


0.11 


0.81 


0.41 


-136 


0.17 


9.27 


0.00* 


2.77 


0.00* 


-2.77 


0.00* 


0.37 


0.82 


0.24 


0.80 


-0.44 


0.65 


1.15 


0.56 


-0.55 


0.58 


-1.04 


0.29 


2.51 


0.28 


1.55 


0.12 


-130 


0.13 


1.19 


0.54 


-0.32 


0.74 


1.08 


0.27 


3.34 


0.18 


1.70 


0.08 


1.59 


0.11 



Table 5.2. Item Parameter Estimates a and b and Three Item Invariance Indices for Essentially 
Unidimensional Algebra Test 1.2. 3 



Itcm SI S2 SI S2 y 2 p ESA p EUA p 

ys0J2 T95 £26 1.59 -0.34 2.97 0.22 -0.88 0.37 -167 0 09 

ysO 3 1.29 0.11 1.21 0.28 2.29 0.31 1.50 0.13 - W7 M 

yS 1? S "°F 144 -°' 22 3 24 0 19 148 013 1.61 oilO 

ysl51 1.00 1.51 0.88 1.78 1.26 0.53 1.12 0.26 -106 0 28 

y ?°JZ 129 mlM 132 "°- 72 669 003* 2.27 0.02* 2.27 0 02* 

ys086 1.12 0.01 1.13 0.05 0.10 0.94 0.31 0.75 0 31 0 75 

ysl96 1.02 -0.39 1.03 -0.29 0.65 0.72 0.79 0.42 0 79 042 

ys019 0.76 0.81 0.71 0.77 0.39 0.81 -0.20 0.83 -044 0 65 

y 511 iS ^ 1 12 - 001 6 13 0- 04 * 2.35 0.01* -2!42 o!oi* 

ys084 0.67 1.23 0.49 1.61 2.60 0.27 1.17 0.24 -149 0 13 

yS ?J 1-S 0 03 168 0 12 0- 94 0 62 0.92 0.35 0.92 0 35 

ysllS 2.30 0.30 1.75 0.40 3.77 0.15 1.03 0.30 -2.01 0 04* 

ys053 1.76 -0.39 1.69 -0.16 6.40 0.04* 2.51 001* -2 51 oni*- 

VS087 0.54 2.90 0.41 3.94 1.40 0.49 1.38 023 -i.ll JS 
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Table 5.3 presents the examination of the item parameter invariance property for multidimensional 
Test 1. It was expected that the item invariance property would fit an essentially unidimensional test better 
than a multidimensional test Therefore, all items violating the invariance property in the essentially 
unidimensional test should not fit the invariance property in a multidimensional situation. Unfortunately, 
the expected result did not occur. In this table, it was found that only one-half of the twelve target items, 
which violated the invariance property in the two earlier unidimensional calibrations were identified as 
violating the invariance property in the multidimensional calibration. Most of the remaining six target 
items show moderate): good fit to the item invariance property. In addition, the proportion of the items 
violating the item parameter invariance property in the multidimensional test (i.e., 23/75 or approximately 
.31) is not dramatically higher than its unidimensional arithmetic test (i.e., 7/28 or approximately 25) and 
algebra test (i.e., 4/14 or approximately .29). This result indicates that it may not be valid to use the 
essential dimensionality of a test as an index to determine the appropriateness of using unidimensional item 
calibration for a test 

For the purpose of exploring the appropriate level of SIMS mathematic content in which the item 
invariance property holds, the previous essentially unidimensional tests (i.e., arithmetic or algebra tests) 
were further split into three subscales. In this analysis three tests, common fractions, decimal fractions and 
ratio were selected from tests 3.1, 2.1 and 4.1, respectively. Items within the same 
arithmetic subarea were calibrated on a "unidimensionaT arithmetic subscale. The existence of the item 
parameter invariance property for these calibrations was examined and the results are presented in Tables 6. 1 
to 6.3. 

Table 6.1 shows that the invariance property of the item parameter is perfect at this level of item 
calibrations; that is, all six common fraction items in test C fit the item invariance property. The 
correlations between the three invariance indices, still, are low. The results of the existence of the 
invariance of the item parameters for the seven decimal fractions items in test 2.1 are shown in Table 6.2. 
Hie invariance property of the item parameters again fits this level of item calibration perfectly. A similar 
results was found for the 11 ratio items in test 4.1 and is displayed in Table 6.3. 

In conclusion, the results in Tables 5.1 and 5.2 show that many essentially unidimensional items 
were detected to lack item invariance. Table 5.3 displays the invariance property of multidimensional items 
in test 1 and shows that multidimensional items in test 1 fit the invariance property almost as well as the 
essentially unidimensional tests. Both results indicate that Stout's essential unidimensionality assumption 
may not be a sufficient condition for the existence of the invariance property of item parameters. 
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Table 5.3. Item Parameter Estimates a and b and Three Item Invariance Indices for Essentially 
Multidimensional Test 1. 



Item Si S2 



yslOO 1.22 0.27 

ysl89 1.36 -0.28 

ysl51 0.92 1.54 

ys076 0.95 0.81 

ysl65 1.15 0.09 

ysl67 0.66 -0.37 

ys031 1.33 -0.74 

ysC69 1.91 -1.58 

ys038 1.24 0.05 

ysl68 0.48 4.72 

ysl75 1.23 0.83 

ys079 0.98 0.24 

XS01Z 1.54 -0.92 

ysl81 1.44 -0.49 

vs045 0.77 1.19 

ys012 1.71 -0.27 

ysl21 1.03 -0.29 

ys086 1.12 -0.00 

ys023 0.78 1.21 

vsl09 2.01 -1.00 

ysl27 0.77 0.91 

ysl22 1.62 -0.28 

ysl03 1.45 -0.04 

1.54 -0.27 

ys013 1.66 0.06 

ys005 1.00 -0.66 

ysl49 1.32 -0.37 

vs07S 1.49 -0.46 

ys068 1.69 -0.99 

ys019 0.86 0.70 

ysQQl 2.03 -0.48 

vsl40 1.47 0.15 

ys008 1.04 -0.26 

ysl79 1.19 0.60 

ysl96 1.08 -0.39 

ys009 1.20 -0.09 

vsQ43 1.43 0.43 

ys046 1.18 0.01 

ys028 1.13 -0.04 

ysl56 0.94 0.12 

ys042 1.27 -1.28 

ys030 1.16 -1.08 

ysl85 1.47 -0.75 

ys087 0.61 2.56 

ysl95 1.65 0.01 

ys025 0.71 2.37 

yslOl 0.95 -0.87 

ys!71 0.27 4.76 

ys072 0.84 0.79 

ys058 0.52 1.40 

ysl32 0.75 -2.30 

ys097 0.82 -0.23 
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0.19 


-1.37 


0.16 


0.80 


-0.87 


1.82 


0.40 


-0.01 


0.98 


-1.09 


0.27 


0.21 


6.47 


0.78 


0.67 


0.86 


0.38 


0.80 


0.42 


0.85 


0.77 


0.01 


0.99 


-0.11 


0.90 


0.10 


0.91 


0.55 


1.66 


2.59 


0.27 


0.73 


0.46 


1.39 


0.16 


1.08 


-1.79 


3.77 


0.15 


1.41 


0.15 


1.62 


0.10 


0.73 


-0.00 


3.33 


0.18 


1.65 


0.09 


-1.37 


0.17 
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Table 5.3 (Continued) Item Parameter Estimates a and b and Three Item Invariance Indices for 
J|sscnbajly_Multidifnensional Test 1. 



Item SI S2 SI S2 Y 2 p ESA p EUA p 

ysl87 LS9 -068 1.56 -0.61 0.75 068 0.73 0.46 -077 0 43 

yS !IS 3 03 0 38 3 38 213 034 0.41 0.67 0.50 0.61 

ysl59 0.73 -0.66 0.67 -0.65 0.38 0.82 0.00 0.99 -053 0 59 

0 80 186 198 0-37 -0.13 0.89 0.83 0.40 
^ 1.0 0.56 1.10 0.67 2.17 0.33 0.89 0.37 1.35 0.17 
ys077 0.74 -0.69 0.59 -0.53 4.63 0.09 0.87 0.38 -1.47 0 14 

ys0 j£ 1,44 "°- 47 131 - 0 - 20 1127 0 00* 3.13 0.00* -328 0 00* 

ys070 0.67 0.76 0.71 1.04 4.16 0.12 1.35 0.17 188 0 05 

XSlQS 1.41 -0.63 1.30 -039 9.61 0.00* 2.61 0.00* -282 0 00* 

ys033 1.97 -1.28 2.05 -1.14 2.27 0.31 1.32 0 18 1 32 0 18 

ysl91 1.15 1.77 1.06 1.84 0.42 0.80 0 33 073 -051 0 60 

ys048 1.13 -0.72 0.91 -0.82 2.47 0 29 -OJS 044 ?50 0 13 

yS ul !i5 US 116 011 236 03 ° 153 012 - .53 0.12 

J5115 1.82 0.31 1.51 0.41 3.37 0.18 1.18 0.23 -185 0 06 

ys022 1.15 -0.39 1.08 -0.33 0.85 0.65 0.65 0.50 -0*77 0 43 

XSQ5J 1.74 -0.39 1.59 -0.16 10.09 0.00* 3.02 0.00* -3 12 0 00* 

y !i?f ? 62 132 046 187 248 028 1.48 0.13 -L47 0 14 

y Hl i t H 9 159 017 082 066 -0- 22 0.82 0.90 0 36 

ys057 0.85 0.01 0.72 0.38 6.72 0.03* 2.55 0.01* -203 0 04* 

y fJ25 "?•£ 121 - 013 3 04 021 l <* 0.09 1.65 S5 

ys084 0.77 1.06 0.59 1.34 2.50 0.28 1.13 0.25 -150 0 13 

y S?2 f5 i£ 054 234 795 00} * 214 0.03* -2A0 0.01* 
ys014 14 Q -0.27 1.28 -0.03 8.07 nni* *76 Q.QQ* -2.8S 0.00* 

Note. Items underlined denote violating the invariance property in essentially unidimensional test 

The invariance property exists when essentially unidimensional arithmetic tests are split into three 
subtests in which the smallest contains only six items (Table 6.1). This result contradicts the finding of 
prior studies which concluded that item parameters are more stable in a longer test than a shorter test ( 
Shepard, Camilli & William, 1985. Subkoviak. Mack, Ironson & Craig. 1984). A possible implication 
for this result is that unidimensionality has a stronger influence than the number of items on the stability 
of item parameter estimation. A follow-up study is needed to make this issue clear 



Test 1 Item Parameter Estimales « b and Three Item Invariance Indices for Common Fractions in 



a 



Item SI S2 SI S2 
ys003 3l8 IoT 31 T33~ 



X 2 


P 


ESA 


P 


EUA 


P 


.84 


.67 


-.36 


.72 


-.89 


.37 


.24 


.88 


.49 


.63 


.44 


.66 


.09 


.95 


-.19 


.85 


-.30 


-.76 


2.01 


.37 


-1.25 


.21 


-1.25 


.21 


1.51 


.47 


.09 


.93 


-.97 


.33 
.47 


1.72 


.42 


.03 


.98 


.72 



ys004 1.07 1.19 -1.50 -1.32 

ys043 1.26 1.25 .72 .69 

ys044 .76 .75 .58 .20 

ysl85 1.24 .95 -.88 -.86 

vs!86 .52 .69 -2.02 -2.00 
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Tabic 6.2. Item Parameter Estimates a and b and Three Item Invariance Indices for Decimal Fractions in 
Test 2.1. 



a 



'tern 


SI 


S2 


SI 


S2 


ysOOS 


1.01 


1.04 


-.53 


-.48 


ys006 


.93 


1.08 


.56 


.59 


ys007 


1.38 


1.32 


-.44 


-.41 


ys045 


.79 


.81 


1.22 


1.15 


ysl09 


1.67 


1.70 


-.90 


-.95 


ysl40 


1.31 


1.30 


.43 


.45 


ysl41 


.76 


.98 


.11 


-.01 



.22 
.48 
.05 
.15 
.46 
.03 
.90 



p 


ESA 


P 


EUA 


P 


.90 


.47 


.64 


.37 


.71 


.79 


.09 


.93 


.54 


.59 


.98 


.15 


.88 


-.16 


.87 


.93 


-.31 


.76 


.27 


.79 


.79 


-.46 


.65 


.46 


.65 


.99 


.17 


.86 


-.15 


.86 


.64 


-.55 


.58 


.85 


.39 



JS^JsMJtSgi^ramejcrEstimates a and b and Three Item Invariance Indices for Ratio Items in T est 4. 1 . 
a b 

Item SI S2 SI S2 y 2 p ESA p EUA p 

ys008 -95 !99 !05 M M !93 ^9 78 33 tT 

ys009 1.33 1.32 -.11 -.09 .05 .97 .23 .82 -22 82 

ys046 1.31 1.46 -.03 .02 1.07 .59 .47 64 91 36 

ys047 1.31 .96 -26 -.24 1.74 .42 .03 .98 -1.27 20 

ys079 .65 .95 1.79 1.94 4.92 .09 .26 .79 1 38 *17 

ysllO 1.00 1.00 -.02 .05 .07 .96 .27 .78 - 27 78 

ysl42 .69 .68 2.09 1.81 .79 .67 -.48, .63 .'.46 63 

ys 43 .78 .55 -.42 -.40 1.32 .52 .06 .95 -1.04 30 

ysl90 1.23 1.12 -.26 -.25 .45 .80 .05 .96 - 67 50 

ysl91 1.06 1.06 2.10 1.73 1.84 .40 .88 .38 88 38 

VS192 1.33 1.19 .04 .05 .21 .90 .02 99 - 45 65 



Conclusions and Discussion 

The reliability and validity of Stout's essential dimensionality statistics were examined in this 
study. The stability of two essential dimensionality measures was found to be low for some tests across 
ten random samples. The cause of this difference is unclear because the effect of the interaction between 
respondents and items are confounded with the effect of the reliability of Stout's measures. If we can declare 
that the cognitive ability space is the same across groups of random samples, we can conclude that Stout's 
two essential dimensionality measures are somewhat unreliable. The essential dimensionality results for 
the four tests across four ATI assignments was also different which indicates that the essential 
dimensionality estimate for a test is related to the characteristics of the ATI items. 

Two substantive findings in the first analysis are first, four algebraic tests tend to be more 
consistently identified as essentially unidimensional than their arithmetic counterparts, and second. Stout's 
original essenual dimensionality measure is more consistent than the refined statistic which was proposed 
by Nandakumar ( 1993). 

In the second section of analysis, the effect of reducing the number of examinees and test items 
was analyzed. It was found that reducing sample size docs not provide consistent improvement on the 
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degree of the essential unidimensionality. Small sample size does however cause a fatal problem in running 
DIMTEST. The degree of essential unidimensionality tended to increase when the number of test items and 
number of ATI items decreased. But, test 2 was flagged as multidimensional even when the number of 
ATI items was reduced to 4. Therefore, reducing the number of test items and ATI items does not assure 
unidimensionality. The characteristics of ATI items likely has more influence on the essential 
dimensionality estimates. 

The validity of replacing the IRT unidimen». ,,ality assumption by the essential dimensionality 
assumption was assessed at the last section of the analysis using the invariance of item parameters as 
evidence. It was universally found that the relationships between the existence of the item invariance 
property and the essentially unidimensional item calibrations (i.e., arithmetic and algebra scale) are low 
across test forms and mathematic areas. The degree of the fit of the item invariance property for two 
"essentially unidimensional" tests (75% and 71%) are approximately the same as the "essentially 
multidimensional" test 1 (69%). A possible interpretation is that since the degree of the essential 
dimensionality of a test is related to the characteristics of the ATI items, the degree of dimensionality for a 
test can not be meaningfully determined unless "appropriate" ATI items are determined. Therefore, a further 
study on the criteria for ATI items is needed to enhance the validity of replacing the IRT unidimensionality 
assumption by the essential dimensionality assumption. 

For the purpose of determining on which mathematical level the item invariance property exists, 
four essentially unidimensional arithmetic tests were further split into three subtests. Three logically 
constructed subtests across three test forms (i.e., common fractions in test 3.1 , decimal fracUons in test 2.1 
and ratios in test 4.1) fit the item parameter invariance property consistently well. 
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